The Factored Policy Gradient planner (IPC-06 Version)
نویسندگان
چکیده
We present the Factored Policy Gradient (FPG) planner: a probabilistic temporal planner designed to scale to large planning domains by applying two significant approximations. Firstly, we use a “direct” policy search in the sense that we attempt to directly optimise a parameterised plan using gradient ascent. Secondly, the policy is factored into a per action mapping from a partial observation to the probabilility of executing, reflecting how desirable each action is. These two approximations — plus memory use that is independent of the number of states — allow us to scale to significantly larger planning domains than were previously feasible. Unlike other probabilistic temporal planners, FPG can attempt to optimise both makespan and the probability of reaching the goal. The version of FPG used in the IPC-06 competition optimises the makespan only, and turns off concurrent planning.
منابع مشابه
FF + FPG: Guiding a Policy-Gradient Planner
The Factored Policy-Gradient planner (FPG) (Buffet & Aberdeen 2006) was a successful competitor in the probabilistic track of the 2006 International Planning Competition (IPC). FPG is innovative because it scales to large planning domains through the use of Reinforcement Learning. It essentially performs a stochastic local search in policy space. FPG’s weakness is potentially long learning time...
متن کاملConcurrent Probabilistic Temporal Planning with Policy-Gradients
We present an any-time concurrent probabilistic temporal planner that includes continuous and discrete uncertainties and metric functions. Our approach is a direct policy search that attempts to optimise a parameterised policy using gradient ascent. Low memory use, plus the use of function approximation methods, plus factorisation of the policy, allow us to scale to challenging domains. This Fa...
متن کاملPolicy-Gradient Methods for Planning
Probabilistic temporal planning attempts to find good policies for acting in domains with concurrent durative tasks, multiple uncertain outcomes, and limited resources. These domains are typically modelled as Markov decision problems and solved using dynamic programming methods. This paper demonstrates the application of reinforcement learning — in the form of a policy-gradient method — to thes...
متن کاملmGPT: A Probabilistic Planner Based on Heuristic Search
We describe the version of the GPT planner used in the probabilistic track of the 4th International Planning Competition (ipc-4). This version, called mGPT, solves Markov Decision Processes specified in the ppddl language by extracting and using different classes of lower bounds along with various heuristic-search algorithms. The lower bounds are extracted from deterministic relaxations where t...
متن کاملPlanning for Welfare to Work
We are interested in building decision-support software for social welfare case managers. Our model in the form of a factored Markov decision process is so complex that a standard factored MDP solver was unable to solve it efficiently. We discuss factors contributing to the complexity of the model, then present a receding horizon planner that offers a rough policy quickly. Our planner computes ...
متن کامل